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D ' Abstract 



Suppose a given observation matrix can be decomposed as the sum of a low-rank matrix and a sparse 
matrix (outliers), and the goal is to recover these individual components from the observed sum. Such 
additive decompositions have applications in a variety of numerical problems including system identi- 
fication, latent variable graphical modeling, and principal components analysis. We study conditions 
under which recovering such a decomposition is possible via a combination of £1 norm and trace norm 
minimization. We are specifically interested in the question of how many outliers are allowed so that 
convex programming can still achieve accurate recovery, and we obtain stronger recovery guarantees than 
previous studies. Moreover, we do not assume that the spatial pattern of outliers is random, which stands 
' in contrast to related analyses under such assumptions via matrix completion. 

<T> ■ 1 Introduction 
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This work studies additive decompositions of matrices into sparse (outliers) and low-rank components. Such 



ly-j , deco mpositions have found applic ations in a variety of numerical pro blems, including system identifica- 



tion ( Chandrasekaran et all 20091). latent variab le graphical modeling ( Chandrasekaran et al. . 2010l ). and 



principal component analysis (jCandes et al. ■ l2009h . In these settings, the user has an input matrix Y € 



which is believed to be the sum of a sparse matrix Xs and a low-rank matrix Xl- For instance, in the ap- 
plication to principal component analysis, Xl represents a matrix of m data points from a low-dimensional 
subspacc of R" , and is corrupted by a sparse matrix X$ of errors before being observed as 

Y = X s + X L . 

(sparse) (low-rank) 

The goal is to recover the original data matrix Xl (and the error components X,?) f rom the corrupted 
observations Y. In the latent variable model application of IChandrasekaran et al. ( 2010j ), Y represents the 



precision matrix over visible nodes of a Gaussian graphical model, and Xs represents the precision matrix 
over the visible nodes when conditioned on the hidden nodes. In general, Y may be dense as a result of 
dependencies between visible nodes through the hidden nodes. However, X$ will be sparse when the visible 
nodes are mostly independent after conditioning on the hidden nodes, and the difference Xl = Y — Xs will 
be low-rank when the number of hidden nodes is small. The goal is then to infer the relevant dependency 
structure from just the visible nodes and measurements of their correlations. 

Even if the matrix Y is exactly the sum of a sparse matrix Xs and a low-rank matrix Xl, it may be 
impossible to identify these components from the sum. For instance, the sparse matrix Xs may be low-rank, 
or the low-rank matrix Xl may be sparse. In such cases, these components may be confused for each other, 
and thus the desired decomposition of Y may not be identifiable. Therefore, one must impose conditions on 
the sparse and low-rank components in order to guarantee their identifiability from Y. 
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We present sufficient conditions under which X$ and Xl are identifiable from the sum Y. Essentially, 
we require that Xs not be too dense in any single row or column, and that the singular vectors of Xl not 
be too sparse. The level of denseness and sparseness are considered jointly in the conditions in order to 
obtain the weakest possible conditions. Under a mild strengthening of the condition, we also show that Xs 
and Xl can be recovered by solving certain convex programs, and that the solution is robust under small 
perturbations of Y. The first program we consider is 

min A||Xs|| vec (i) + ||Xl||* 

(subject to certain feasibility constraints such as \\X$ + Xl — Y\\ < e) where || ■ || VC c(i) is the entry-wise 
1-norm and | | ■ |U is the trace norm. These norms are natural convex surrogates for the sparsity of X$ and the 
rank of Xl (jTibshiran 1 Il996t iFazell . |2002| ) , which are generally intractable to optimize. We also considered 
a regularized formulation 



min —\\X S + X L - F||2 cc(2) + A||X S || VCC(1) + \\X L \\ 



c (2) is the Frobenius norm; such a formulation may be more suitable in certain applications and 



where || 

enjoys different recovery guarantees. 



1.1 Related work 



Our work closely follows that of IChandrasekaran et al. (2009), who initiated the study of rank-sparsity 
incoherence and its application to matrix decompositions. There, the authors identify parameters that 
characterize the incoherence of Xs and Xl sufficient to guarantee identifiability and recovery using convex 
programs. However, their analysis of this characterization yields conditions that are significantly stronger 
than those given in the our present work. For instance, the allowed fraction of non-zero entries in Xs is 
quickly vanishing as a function of the matrix size, even under the most favorable conditions on Xl', our 
analysis does not have this restriction and allows Xs to have up to Sl(mn) non-zero entries when Xl is low- 
rank and has non-sparse singular vectors. Therefore, for instance, in the application to principal component 
analysis, our analysis allows for up to a constant fra ction of the data matr ix entries to be corrupted by 
noise of arbitrary magnitude, whereas the anal ysis of Chandrasekaran et all requires that it decrease as a 
function of the matrix dimensions. Moreover, Chandrasekaran et al.l only consider exact decompositions, 
which may be unrealistic in certain applications; we allow for approximate decompositions, and study the 
effect of perturbations on the accuracy of the recovered components. 

The application to principal component analysis with gross sparse errors was studied by I Candes et al 



(|2009), building on previous results and analysis techniques for the related matrix comple tion problem 



Candes and Rechtll2009l:lGrosd l 20091 ) . The sparse errors model studied by ICandes et al.l requires that 



the support of the sparse matrix Xs be random , which can be unreali s tic in some settings. However, the 
conditions are significantly weaker than those of IChandrasekaran et al. (2009): for instance, they allow for 
f2(mn) non-zero entries in Xs- Our work makes no probabilistic assumption on the sparsity pattern of Xs and 
instead studies purely deterministic structural conditions. The price we pay, however, is rou ghly a factor o f 
rank(Xi) in what is allowed for the support size of Xs (relative to the probabilistic analysis of lCandes et all ). 
Na rrowing this gap with alternative deterministic conditions is an interesting open problem. Fol low-up work 
to ( Candes et al. . 20091) studies the ro bustness of the reco very procedure ( Zhou et "all . |2010| ). as well as 
quantitatively weaker conditions on X s (|Ganesh et alll2010l) . but these works are only considered under the 
random support model. Our work is therefore largely complementary to these probabilistic analyses. 



1.2 Outline 

We describe our main results in Section [2J In Section [3J we review a number of technical tools such as 
matrix and operator norms that are used to characterize the rank-sparsity incoherence properties of the 
desired decomposition. Section 0] analyses these incoherence properties in detail, giving sufficient conditions 
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for identifiability as well as for certifying the (approximate) optimality of a target decomposition for our 
optimization formulations. The main recovery guarantees are proved in Sections [5] and [6j 



2 Main results 

Fix an observation matrix Y <E R mxn . Our goal is to (approximately) decompose the matrix Y into the sum 
of a sparse matrix Xg and a low-rank matrix Xl ■ 



2.1 Optimization formulations 

We consider two convex optimization problems over (Xs,Xl) G R mx ™ x R mxn . The first is the constrained 
formulation (parametrized by A > 0, e vec m > 0, and e* > 0) 

min A||Xs|| vec (i) + \\X L \\* 

S.t. \\Xg + X L - y|| vec (l) < e V cc(l) (1) 

\\X S +X L -Y\\* < e* 

where || • || V cc(i) is the entry-wise 1-norm, and || • ||* is the trace norm (i.e., sum of singular values). The 
second is the regularized formulation (with regularization parameter fi > 0) 

min ±\\X s + X L -Y\\l e<2) +\\\X s \\ vec(1) + \\X L \U (2) 

where || ■ || voc (2) is the Frobenius norm (entry- wise 2-norm). 

We also consider adding a constraint to control ||^l||vcc(oo)j the entry- wise oo-norm of X L . To ([1]), we 
add the constraint 

II^lIIvgc(oo) < b 

and to ((2]), we add 

\\Xs - ^Hvcc(oo) < b 

The parameter b is intended as a natural bound for Xl and is typically known in applications. For example, 
in image processing, the values of interest may lie in the interval [0, 255], and hence, we may take b = 500 as a 
relaxation of the box constraint [0, 255]. The core of our analyses do not rely on these additional constraints; 
we only consider them to obtain improved robustness guarantees for recovering Xl , which may be important 
in some applications. 



2.2 Identifiability conditions 



Our fi rst result is a refinement of the rank- spar sity incoherence notion developed by iChandrasekaran et al 



(l2009h . We characterize a target decomposition of Y into Y = X$ + Xl by the projection operators to 
subspaces associated with Xs and Xl- Let 

fl = (l(Xs) := {X G M mxn : supp(X) C supp(Xs)} 

be the space of matrices whose supports are subsets of the support of Xg, and let Vq be the orthogonal 
projector to Cl under the inner product (A, B) = tr(A T B), where Vq(M) is given by 

Furthermore, let 

f = T(X L ) := {X 1 +X 2 eM«" : ranged) C range (X L ), range(X 2 T ) C range(Xj)} 
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be the span of matrices either in the row-space or column-space of Xl, and let Vf be the orthogonal projector 
to T, again, under the inner product (A, B) = tr(A T B); this is given by 



Vf{M) = UU 1 M + MVV 1 -UU [ MVV 

where U G jj mX1, anc j y g R nxr are, respectively, matrices of left and right orthonormal singular vectors 
corresponding to the non-zero singular values of Xl, and r is the rank of Xl- We will see that certain 
operator norms of Vq and Vf can be bounded in terms of structural properties of X$ and Xl- The first 
property measures the maximum number of non-zero entries in any row or column of Xs: 

a(p) := maxj/oH sign(X s )||i^i, p~ 1 \\ sign(X s )|| 00 ^ 00 } 

where ||M||p_>. g := max{||Mu|| 9 : v G R", ||u|| p < 1}, 

C -1 if Mi d < 
sign(M) i)j = <^ if M- d = Vt G [m],j G [n] 
[ +1 if Mij > 

and p > is a balancing parameter to accommodate disparity between the number of rows and columns; a 
natural choice for the balancing parameter is p := \fnjm. We remark that p is only a parameter for the 
analysis; the optimization formulations do not directly involve p. Note that X$ may have £l(mn) non-zero 
entries and a(y/n/m) = 0(yjmn) as long as the non-zero entries of Xg are spread out over the entire matrix. 
Conversely, a sparse matrix with just 0(m + n) could have a(\/n/m) — y/mn by having all of its non-zero 
entries in just a few rows and columns. 

The second property measures the sparseness of the singular vectors of Xl ■ 

P(j>) := p- 1 ||^ T ||vec(oo)+p||^ T ||vac(a ) + ll^l|2-,oo||V'||2^oo. 

For instance, if the singular vectors of Xl are perfectly aligned with the coordinate axes, then /3(p) = 
On the other hand, if the left and right singular vectors have entries bounded by y/c/m and J c/n, 
respectively, for some c > 1, then /3(y/n/m) < Zcf/^fmn. 
Our main identifiability result is the following. 

Theorem 1. If mf p>0 a(p)f3(p) < 1, then Q n f = {0}. 

Theorem [1] is an immediate consequence of the following lemma (also given as Lemma ITU]) . 

Lemma 1. For all M G K mx ", \\V R (Vf (AO)llvec(i) < inf p>0 a(p)^(p)||M|| vec(1) . 

Proof of Theorem^ Take any M G OnT. By LemmaHJ W)llvec(i) < a(p)0(p)\\ M hec(i)- On the 

other hand, Pj)(Pf(M)) = M, so a(p)0(p) < 1 implies ||M|| vec(1) = 0, i.e., M = 0. □ 

Clearly, if O n f contains a matrix other than 0, then {(Xs + M,X L - M) : M G Cl n f } gives a family 
of sparse/low-rank decompositions of Y = Xs + Xl with at least the same sparsity and rank as (Xs,Xl)- 
Conversely, if ClOT — {0}, then any matrix in the direct sum f2 © T has exactly one decomposition into a 
matrix A G plus a matrix B G T, and in this sense (Xs, Xl) is identifiable. 

Note that, as we have argued above, the condition inf p>0 a(p)(3(p) < 1 may be achieved even by matrices 
Xg with Q(mn) non-zero entries, provided that the non-zero entries of Xs are sufficiently spread out, and 
that Xl is low-rank and has singular vectors far from the coordinate bases. This is in contrast with the 
conditions studied by Chandrasekaran et al.l f|2009h . Their analysis uses a different characterization of Xs and 



Xl, which leads to a stronger identifiability condition in certain cases. Rough ly, if Xs has an approx imately 



symmetric sparsity pattern (so || sign(Xs)||i_>i ~ \\ sign(Xs) ||oo->-oo)> then IChandrasekaran et alj require 



a(l)y/ f3(l) < 1 for square n x n matriceslll Since f3(l) = £1(1 /ri) for any Xl G R nxn , the condition implies 

^Chan drasekara n et al do not explicitly work out the non-square case, but claim that n can be replaced in their 

analysis by the larger matrix dimension max{m, n}. However this does not seem possible, and the analysis there should only 
lead to the quite suboptimal dimensionality dependency min{m, n}. This is because a rectangular matrix Xl will have left and 
right singular vectors of different dimensions and thus different allowable ranges of infinity norms. 
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a(l) 2 = 0(n). Therefore X$ must have at most 0(n) non-zero entries (or else a(l) 2 becomes super-linear). 
In other words, the fraction of non-zero entries allowed in X$ by the condition a(l)y/3(l) < 1 is quickly 
vanishing as a function of n. 

2.3 Recovery guarantees 

Our next results are guarantees on (approximately) recovering the sparse/low-rank decomposition (Xg, Xl) 
from Y = Xs + Xl via solving either convex optimization problems ([TJ or @. We require a mild strengthen- 
ing of the condition inf p>0 a{p)(3{p) < 1, as well as appropriate settings of A > and p > for our recovery 
guarantees. Before continuing, we first define another property of Xl: 

7 := ||C/^ T ||voc(oo) 

which is approximately the same as (in fact, bounded above by) the third term in the definition of /3(p). 
The quantities a(p), (3(p), and 7 are central to our analysis. Therefore we state the following proposition for 
reference, which provides a more intuitive understanding of their behavior. This is the only part in which 
explicit dimensional dependencies comes into our analysis. 

Proposition 1. Let mo the maximum number of non-zero entries of Xs per column and no be the maximum 
number of non-zero entries of X$ per row. Let f be the rank of U and V . Assume further that mo < cim/f 
and no < C\n/r for some c\ € (0, 1), and || j/|| V ec(oo) < v c 2/m and \\V\\ vec (oo) < y&ijn for some C2 > 0. 
Then with p = \J n/m, we have 

a(p) < —y/mn, (J(p) < —=, 7 < -=. 
r y/mn y/mn 

We now proceed with conditions for the regularized formulation @. Let E := Y — (X$ + Xl) and 

£2^2 := ||£'||2->2 

£voc(oo) := ll^llvcc(oo) + ||7 , t(-£')IIvoc(oo)- 

We require the following, for some p > and c > 1: 

a(p)/3(p) < 1 (3) 

. . (1 - a(p)P{p)){l ~ c- p~ 1 e 2 ^2) - c- a(p)p~ 1 e vcc{oo) - c • a(p) 7 

A S p— ; (4) 

c • a{p) 

c VCC(0O) /_\ 

A > c- . . — . . N ' ' > 0. (5) 

1 - a(p)f3(p) - c ■ a{p)f3{p) V ; 

For instance, if for some p > 0, 

1 3 

a(ph < — and a(p)(3{p) < — , (6) 

then the conditions are satisfied for c = 2 provided that p and A are chosen to satisfy 

f A 2 £ VOC(00) 1 j 15 ^ 15 1 

p > max|4. £2 ^ 2 , ^-^^} and — • 7 < A < - ■ (7) 

Note that © can be satisfied when c\ < c 2 " 1 /41 in Proposition [TJ 

For the constrained formulation ([1]), our analysis requires the same conditions as above, except with E 
set to 0. Note that our analysis still allows for approximate decompositions; it is only the conditions that 
are formulated with E — 0. Specifically, we require for some p > and c > 1: 

a(p)f3(p) < 1 (8) 

A < 1 - a{p)P{p) ~ c ■ a{p)l , g » 
c • a(p) 

l-a(p)/3{p)-c-a{p)(3{p) 
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For instance, if for some p > 0, 

a(p)7<^ and a(p)p(p) < ±, (11) 
then the conditions are satisfied for c = 2 provided that A is chosen to satisfy 

57 < A < (12) 

Note that (fTTj) can be satisfied when c\ < c^ 1 /15 in Proposition [T] 

In summary, Proposition [T] shows that ou r results can be applie d even with mo = Q(m/f) and no — 
fl(n/f) outliers. In contrast, the results of IChandrasekaran et al.l (|2009t ) only apply under the condi- 
ti qn maxima, np) = Q(^/m.m(m,n)/f), which is significantly stronger. Moreover, unlike the analysis 
of ICandes et al. (2009), we do not have to assume that supp(JCs) is random. 



The following theorem gives our recovery guarantee for the constrained formulation (TTJ). 

Theorem 2. fix a target pair (X s , X L ) E W nxn x R mXTl satisfying \\Y - (X s + X L )\\ vec (i) < e vcc(1) and 
\\Y — (Xs + Xl)\\* < e*. Assume the conditions ([8]), ([9]), and (|10[) hold for some p > and c > 1. Let 
(Xs,Xl) € W mxn be the solution to the convex optimization problem (JXJ) . We have 



tuax^\\X s - Xs|| vcc(1) , \\X L - X L \\ 

vcc(l 

/ , 2-a(p)f3(p) \ x 2 - q(p)fl(p) 

1 1 + (1_1/C) •l-«(p)/3(p)J' e ™ c(1) + (1 - 1/c) ' 1 - ' e * /A ' 



//, m addition for some b > ||^Cl||vcc(oo); either: 

• £/ie optimization problem (fTJ) is augmented with the constraint \\Xl\\ VO c(oo) < or 

• is post-processed by replacing [Xl]^ with imn{max{[X L]i.j,~b},b} /or aZZ 
£/ien we also have 



X L - X L || vcc(2 ) < min 



^\\X L - Xl|| vcc (i), \j2b ■ \\X L - Xl|| vcc (i)| 



The proof of Theorem[3]is in Section[5] It is clear that if Y = Xs + Xl, then we can set e vec m = e* = 
and we obtain exact recovery: Xs — Xs and Xl = Xl- Moreover, any perturbation Y — (Xs + Xl) 
affects the accuracy of (Xs, Xl) in entry-wise 1-norm by an amount 0(e vec m + e*/A). Note that here, the 
parameter A serves to balance the entry-wise 1-norm and trace norm of the perturbation in the same way it 
is used in the objective function of (fT]). So, for instance, if we have the simplified conditions (jlip . then we 
may choose A = \J (5/3)~//a(p) to satisfy (fT2j) , upon which the error bound becomes 



:{\\Xs 



^S||vec(l)) \\XL - ^i||vBG(l) f = O I e vec (i) 



la(p) 
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It is possible to modify the constraints in ([T]) to use norms other than || • || VC c(i) and || • ||*; the analysis could 
at the very least be modified by simply using standard relationships to change between norms, although this 
may introduce new slack in the bounds. Finally, the second part of the theorem shows how the accuracy 
of Xl in Frobenius norm can be improved by adding an additional constraint or by post-processing the 
solution. 

Now we state our recovery guarantees for the regularized formulation ^ . 



G 



Theorem 3. Fix a target pair (X s , X L ) € R mx ™ x U mxn . Let E := Y - (X s + X L ) and 



£2->2 
rec(oo) 



| -E 1 1 2^-2 

|-E||vec(oo) + \\Pt (-S)Hvec(oo) 

\Pf{E)\U- 



Let k := | supp(Xs)| and r := rank(Xr,). Assume the conditions (U]), and ([5|) hold for some p > and 
c > 1. Let € g™xn ^ e ^ e so i u n on i ifo e convex optimization problem {3} augmented with the 

constraint \\Xs — ^|| V cc(oo) < & f or some b > \\Xs — y|| V oc(oo) (b = oo is allowed). Let 

2k 

r := (A + pT e vec(oo) ) ■ 1 _ ^7^77) ' ( A + 7 + M~ evec(oo)) 

+ (1 + 2/x- 1 e 2 ^ 2 ) • 2f • (^ T -^L . (A + 7 + /i-^vcctoo)) + 1 + 2/x- 1 e 2 ^ 2 ) • 



» r -(1-1/c) X A • /i + Afc • /i + 2vkt ■ fi + k ■ evccfoo) 

- as vec(l) S : , r^7 s 

1 - a(p)/3(p) 



\\Xs - Xs\\ vec (2) < min|||Xs - Xs|| vec (i), y 2b ■ \\X S - X s \\ vcc (i) 

\\X L -X L \U < V2f ■ \\X S - X s \\ vec( 2) +e'„+ ( r> ' (1 ~ 1/C) + 2f ) ■ /i- 



The proof of Theorem [3] is in Section [51 As before, if F = Xg + X^ so E = 0, then we can set p — > 
and obtain exact recovery with = and Xl = Xl- When the perturbation E is non-zero, we control 
the accuracy of X$ in entry-wise 1-norm and 2-norm, and the accuracy of Xl in trace norm. Under the 
simplified conditions ©, we can choose A = (15/82)/a(p) and p — max{4e2-s.2, 2e vec ( oc )/(15A)} to satisfy 
([7]); this leads to the error bounds 

\\X S - X s \\ vcc (i) =0(fa{p) •max{e 2 - > 2, a(p) e vec(oo)}) 

\\X L - X L \\* = O ^V? • min jyV \\X S - X s \\ vcc (i), \\X S - Xs\\ vec (i)\ + e'„ + f ■ max{e 2 ^2, a(p)e vcc(oo ) }^ 

(here, we have used the facts k < ot(p) 2 , ct(p)X — 6(1), and f' — 0(f), which also implies that k ■ e V oc(oo) — 
0(a(p) ■ a(/o)e V cc(oo)))- Finally, note that if the constraint \\Xs — F|| VO c(oo) < ^ is added (i.e., b < 00), then 
the requirement b > \\X$ — F|| V cc(oo) can be satisfied with b := || Asllvoc(oo) + e voc(oo)- This allows for a 
possibly improved bound on — Xl||*. 

2.4 Examples 

We illustrate our main results with some simple examples. 
2.4.1 Random models 

We first consider a random model for the matrices Xs and Xl (jChandrasekaran et al. , 2009). Let the 
support of Xs be chosen uniformly at random k times over the [m] x [n] matrix entries (so that one entry 
can be selected multiple times). The value of the entries in the chosen support can be arbitrary. With high 
probability, we have 

llsign^llr^of^rA and \\ S ign(X s )\\^ = O ( 
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so for p := \J (n log to) / (m log n) , we have 



a(p) = 0\~k s 



(log m) (log n) 



mn 



The logarithmic factors are due to collisions in the random process. Now let U and V be chosen uniformly 
at random over all families of f orth onormal vectors in R m and R™, respectively. Using arguments similar 
to those in (Cand es and Rechttl2009h . one can show that with high probability, 



\UU T \\ 



\u\\ 



vcc(oo) 



o 



o 



v log m 



I r log m 



\VV T \ 



vcc(oo) 



o 



o 



r logn 



I r log n 



so for the previously chosen p, we have 



p{p) = o n 



(log m) (log n) 



mn 



and 7 = 



(log to) (log n ) 



mn 



Therefore 



a[p)P{p) = O 



fcr(logTO)(logn) 



and cx(p)j = O 



fcr(log m)(log n) 



both of which are <C 1 provided that 



fc < J ■ 



r(logm)(logn) 



for a small enough constant S £ (0, 1). In other words, when Xl is low-rank, the matrix Xs can have nearly 
a constant fraction of its entries be non-zero while still allowing f or exact decomposition of Y = X$ + X^. 
Our guarantee improves over that of IChandrasekaran et al. ( 2009fl by roughly a fact or of Slffran) 1 ' 4 ), but is 
worse by a factor of f(logTO.)(logn) relative to the guarantees of Candes et al. (2009) for the random model. 
Therefore there is a gap between our generic deterministic analysis and a direct probabilistic analysis of this 
random model, and this gap seems unavoidable with sparsity conditions based on a{p). It is an interesting 
open problem to find alternative characterizations of supp(Xg) that can narrow or close this gap. 

2.4.2 Principal component analysis with outliers 

Suppose is matrix of to data points lying in a low-dimensional subspace of R™, and Z is a random 
matrix with independent Gaussian noise entries with variance tr 2 . Then Y' = Xl + Z is the standard model 
for principal component analysis. We augment the model with a sparse noise component X$ to obtain 
Y = Xs + Xl + Z] here, we allow the non-zero entries of X$ to possibly approach infinity. 

According to Theorem[3j we need to estimate ll-Zlh-s^, II Z \ \ vpr ( ^ ), ||Pt(2')||vcc(oo)) an d H'PtC^)!!*- We 
have the following with high probability ( Davidson and SzarekL 120011 ). 



||Z|| 2 ^2 < o\[m + C7y/n + 0(a). 
Using standard arguments with the rotational invariance of the Gaussian distribution, we also have 

II^IUc(oo) < 0(<r\og(mn)) and H^T^IIvecCoo) < 0(<r log (mn)) 
with high probability. Finally, by Lemma [SJ we have 

\\V T {Z)\\* < 2f\\Z\\ 2 ^2 < 2faVm + 2faVri+0(fa). 
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Suppose (Xs,Xl) has a(p) < ci(y/mn/f), (3(p) = Q(f/y/mn), and 7 = Q(f/y/mn) and satisfies the simpli- 
fied condition (|6]). This can be achieved with C1C2 < 1/41 in Proposition Q] Also assume A and p are chosen 
to satisfy ([7]), and that b > ||^l|| V oc(oo) + e vcc(oo)- Then we note that k = 0{c\mn/f 2 ), and thus have from 
Theorem [3] (see the discussion thereafter): 

— ^sllvecfi) — O (ci\/mn uiax{ay/m + ay/n, o\Jmn log(mn)/f}) 
= O (uci ran log(mn) / f) 

— Xl\\* = O ( \J baci mn log(mn) / f + fa(y/m + \fn)) + ci\/mnj , 

where we may take b — 0(a\og(mn) + ||-Xj,|| V ec(oo)))- 

Now consider the situation where both m, n — > 00, and assume that ||-^i||vec(oo) remains bounded. If 
ci(log(mn)) 2 = o(l), which implies that the we have at most o(m/(\og(mn)) 2 ) outliers per column and 
o(n/(log(mn)) 2 ) outliers per row, then 

\\X L - X L \U = 0{ra(yM + V^))- 

That is, the normalized trace norm \\Xl — \Jnm — > 0. This means that we can correctly recover 

the principal components of Xl with both outliers and random noise, when both m and n are large and 
ci(log(mn)) 2 = o(l) in Proposition [TJ 

3 Technical preliminaries 

3.1 Norms, inner products, and projections 

Our analysis involves a variety of norms of vectors, matrices (viewed as elements of a vector space as well 
as linear operators of vectors), and linear operators of matrices; we define these and related notions in this 
section. 

3.1.1 Entry- wise norms 

For any p S [l,oo], define \\v\\ p := (%2i \ v i\ p ) 1 ^ p be the p-norm of a vector v (with ||w||oo •= max^ \vi\). Also, 
define ||M|| vec ( p ) := (£\ . \Mi t j\ p ) 1 / p to be the entry- wise p-norm of a matrix M (again, with HMH^^) := 
maxj j l-Mjjl). Note that || • || VC c(2) corresponds to the Frobenius norm. 

3.1.2 Inner products, linear operators, and orthogonal projections 

We endow R mx ™ with the inner product (•, •) between matrices that induces the Frobenius norm || • || ve c(2); 
this is given by (M, N) = tr(M T A). 

For a linear operator T : R mx " —> R mx ", we denote its adjoint by T*; this is the unique linear operator 
that satisfies (T*(M),N) = (M,T(N)) for all M e R mx ™ and N G R mx ™ (in this work, we only consider 
bounded linear operators). For any two linear operators 71 and 75 5 we let 71 ° l~i denote their composition 
as defined by (71 o T 2 )(M) := Ti(T 2 (M)). 

Given a subspace W C R mxn , we let denote its orthogonal complement, and let Vw ■ R mxn -> R mx ™ 
denote the orthogonal projector to W with respect to (•,•), i.e., the unique linear operator with range W 
and satisfying Vw* — Vw and Vw Vw — Vw- 

3.1.3 Induced norms 

For any two vector norms || • || p and || • || g , define ||M|| p _,. g := max x ^ ll^^llg/ll^llp to be the corresponding 
induced operator norm of a matrix M. Our analysis uses the following special cases which have alternative 
definitions: 
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• ||M||i_n = ma Xj ||M ej ||i, 

• ||M||i_, 2 = ma Xj ||Me 3 || 2 , 

• ||M||2->.2 = spectral norm of M (i.e., largest singular value of M), 

• ||M|| 2 -).oo = max,; ||M T ei|| 2 , and 

• llMHoo^oc = max j ||M T e j ||i. 

Here, is the ith coordinate vector which has a 1 in the ith position and elsewhere. 

Finally, we also consider induced operator norms of linear matrix operators T : R mx ™ — > K mx " ( m 
particular, projection operators with respect to (•,•)). For any two matrix norms || • \\$ and || • ||<?, define 
||r||^:=m a xM^o||r(M)||^/||M|| . 

3.1.4 Other norms 

The trace norm (or nuclear norm) ||M||* of a matrix M is the sum of the singular values of M. We will also 
make use of a hybrid matrix norm || • \\$( p ), parametrized by p > 0, which we define by 

||M|| B(p) ^maxVUMll^!, p^M]^^}. 

Also define ||M|||,( p ) := sup^n <1 (M, N), i.e., the dual of || • ||fj( p ) (see below). 

3.1.5 Dual pairs 

The matrix norm || • \\<y is said to be dual to || • ||* if, for all M <= M mx ", \\M\\<y = sup^^^M, AT). 

Proposition 2. Fix any matrix norm \\ ■ \\+, and let \\ ■ \\<y be its dual. For all M e K mx " and N £ M mx " 7 
we have 

(M, N) < IIMIUIIJVH*. 

Proposition 3. Fix any any linear matrix operator T : M mx " — > R mx ™ and any pair of matrix norms \\ ■ A 
and || • ||*. We have 

||T|U-* = ||T|lo-<?. 

where \\ ■ \\ty is dual to \\ ■ \ 4,. and \\ ■ ||^> is dual to \\ ■ ||*. 

The following pairs of matrix norms are dual to each other: 

1. || • || vec ( p ) and || • || vec(9 ) where l/p+l/q = 1; 

2. || • ||* and || • || 2 — 5-2 ; 

3. || • ||j(p) and || • \\ b(p) (by definition). 

3.1.6 Some lemmas 

First we show that the || • ||j( p ) norm (for any p > 0) bounds the spectral norm || • || 2 >2 - 

Lemma 2. For any M e M mx ", we have for all p > 0, 

\\M\\ 2 ^<\\M\\ Up) . 
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Proof. Let er be the largest singular value of M, and let u £ R m and v <E R™ be, respectively, associated left 
and right singular vectors. Then 



pM 
p-'M 1 



Moreover, by definition of 



pM 
p-V 2 M T 



p l / 2 u 
p~ x l 2 v 



p l l 2 u 



-V 2 v 



< 



p x / 2 Mv 
p- 1 / 2 M T u 



pM 

p- x A/ T 



i^i 



p l l 2 u 
p- x ' 2 v 



p l l 2 u 



-V 2 v 



Therefore 

||M|| 2 ^2 =o-< 



pM 
p~ 1 M T 



□ 



= max{||p~ 1 A/ T || 1 ^ 1 ,||pM|| 1 ^ 1 } = ma3t{p- 1 ||Af|| 00 _, 00 ,/>||Af|| 1 _n} = \\M\\ Up) . 
The following lemma is the dual of Lemma 
Lemma 3. For any M g R mxn , we have for all p > 0, 

\\M\\ Hp) < \\M\U. 

Proof. We know that ||M||b( P ) = (M,N) for some matrix N such that ||7V|| = 1. Therefore ||iV|| 2 ^2 < 1 
from Lemma [21 and thus using Proposition [2j 

||M|| Kp) = (M,N) < ||M||.||JV|| a _, 2 < ||M||*. □ 

Finally we state a lemma concerning the invertibility of a certain block-form operator used in our analysis. 



Lemma 4. Fix any matrix norm | • A on K mxn and linear operators T\ : R mxn — > R mx " 
R" iX ™. Letl : R mxn R mxn be the identity operator, and suppose ||7I o 75||*->4, < 1. 



andT 2 : R mx ™ 



1. I — 71 o 7i is invertible and satisfies 

\\(I-T 1 oT 2 )- 1 U^< 



1- IITioTa 



1 ° /2||*-^ 



2. TTie linear operator on 



pmxn w inimxn 



1 Ti 

r 2 i 



is invertible, and its inverse is given by 



1 71 

r 2 i 



-75 o (I - 71 o Tz)- 1 X + T 2 o (I - 71 o T2)- 1 o 71 

(I-T10T2)- 1 -(1-T 1 oT 2 )- 1 oTi 
-{l-^oTi)- 1 oT 2 (I-T20T1)- 1 



Proof. The first claim is a standard application of Taylor expansions. The second claim then follows from 
formulae of block matrix inverses using Schur complements. □ 



11 



3.2 Projection operators and subdifferential sets 

Recall the definitions of the following subspaces 

Q(Xs) := {X G R mx ™ : supp(X) C supp(Xs)} 

and 

T(X L ) := {X 1 + X 2 G K mxn : rangepfi) C range(X L ), range(X 2 T ) C range(X^)}. 
The orthogonal projectors to these spaces are given in the following proposition. 
Proposition 4. Fix any X s G R mxn and X L G R mx ". for any matrix M G R mx, \ 

PP f MM _ / if(i,j) € supp(X s ) 

W) WJ « - \ oi/ieraise 

/or all 1 < i < m and 1 < j < n, and 

V n x L ){M) = UU T M + MVV T - UU T MVV T 

where U and V are the matrices of left and right singular vectors of Xl . 

Lemma 5. Under the setting of Proposition^ 



\\Vn(x s )(M)\\ vcc{1) < V\snpp(X s )\\\Vn(x s )(M)\\ vcc(2) < y/\ supp(X s )|||M|| vec(2) 
WPQ(x s )(M)\\ ve<1) < |supp(X s )|||^ (Xs) (M)|| vec(oo) < |supp(X s )|||M|| vec(oo) 
\\V T (x L )(M)\\ 2 ^ 2 < 2\\M\\ 2 ^ 2 

WP T (x L ){M)\\, < 2rank(X L )||Af|| 2 ^ 2 
\\V t{Xl) {M)\\ vcc{2) < 2 v /rank(X i )||M|| 2 ^ 2 . 

Proof. The first and second claims rely on the fact that | supp(Vnrx s )(M))\ — I su PP(^s)Ij as weu as the 
fact that Vq/Xs) is an orthonormal projector with respect to the inner product that induces the |j • || VC c(2) 
norm. For the third claim, note that 

WP T (x L ){M)\\ 2 ^2 < \\UU T M\\ 2 ^ 2 + \\{I-UU T )MVV T \\ 2 ^ 2 < 2||M|| 2 _*. 

The remaining claims use a similar decomposition as the third claim as well as the fact that 

max{rank(W T M),rank((7 - UU T )M VV T )} < rank(X L ). □ 

Define 

sign(X s )e{-l,0 ! +l} mx " 
to be the matrix whose (i, j)th entry is sign([Xs]ij), and define 

orth(X L ) := UV T ', 

where U and V , respectively, are matrices of the left and right orthonormal singular vectors of X^ corre- 
sponding to non-zero singular values. T he following pr oposition characterizes the subdifferential sets for the 
non-smooth norms |j • || VC c(i) an d || • ||* (|Watsonl . 119921 ). 

Proposition 5. The subdifferential set of Xs i— > ||-Xs||vec(i) * s 

d Xs (\\X s \\ vcc{1) ) ={Ge R mx " : ||G|| vcc(oc) < l,V n{Xs )(G) = sign(X s )}; 
the subdifferential set of Xl H> is 

dxA\\X L \\*) = K mxn : ||G|| 2 _> 2 < l,V T (x L ){G) = orth(X L )}. 
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The following lemma is a simple consequence of subgradient properties. 

Lemma 6. Fix A > and define the function g(X$, X£) :— \\\Xs\\ vcc (i) + \\Xl\\*. Consider any (Xs,Xl) 
m R mxn x K mxn . If there exists Q £_R mxn such that: Q is a subgradient of X\\X s \\ vcc{1) at X s = X s , Q 
is a subgradient of \\X L \\* at X L = X L , and \\V n ( Xs) ± (<2)||vec(W) < A/c and \\V T ^x^{Q)h^2 < 1/c for 
some c > 1, then 

g(X s ,X L )-g(Xs,X L ) > (Q, X s +X L -X s -X L )+(l-l/c) (X\\V^(X S - X s )\\ vcc(1) + \\Pf±(X L - X L )\\*) 
for all (X S ,X L ) G W nxn x K mx ". 

Proof. Let Q := Q(X S ), f := T(X L ), A s := X s - X s , and A L : X L - X L . For any subgradient G G 
3 Xs (A||X s || vec(1) ), we have G-Q = V^(G) + TV (G) - V n (Q) - TV (Q) = ?V (G) - TV (Q). Therefore 

A||X S + A 5 || voc( i) - A||X 5 || V ec(i) - (Q, As) 

> sup{(G, A s ) - (Q, A s ) : G G d Xs (X\\X s \\ vcc{1) )} 

> su P {(G - Q, A s ) : G G d Xs {\\\X s \\ vcc{1] )} 

= sup{<TV(G) -P fi x(Q),A 5 ) : G G 9x s (A||X s || vec(1) )} 

= sup{(P fi x(G) - TV (Q), TV (As)) : G G 9x s (A||X s || vec(1) )} 

- sup{(7V(G),7V(As)) - (TV(<2),*V(As)) : G G 0x s (A||Xs||vec(i))} 

= A||TV(As)||vec(i) - (TV (Q), TV (As)) 

> A||TV(As)||vec(l) - ||^(Q)llvcc(oo)||^(As)||vcc(l) 

> A(l-l/ C )||TV(As)||vec(l) 

where the second-to-last inequality uses the duality of || • || ve c(i) an d || ■ ||vec(oo) an d Proposition [3] Similarly, 

\\X L - Ai||. - \\X L \\* - (Q, Al) > (1 - l/c)||TV(Az)||* 
by noting the duality of | ■ ||* and || ■ 1 1 2 ^ 2 . Combining these gives the desired inequality. □ 



4 Rank-sparsity incoherence 

Throughout this section, we fix a target (X S ,X L ) G K mxn x R mx ™, and let Cl := fl(X s ) and T := T(X L ). 
Also let U and V be, respectively, matrices of the left and right singular vectors of Xl corresponding to 
non-zero singular values. Recall the following structural properties of Xs and Xl- 

a(p) := || sign(Xs)||(j( p ) = max{p|| sign(Xs)||i->i, p _1 || sign(A ? s)|| 00 ^ 00 }; 



0(p) := p- 1 \\m T \\ voc{x) +p\\VV T \\ vcc(x) + \\U\\2^oo\\V\ 



2— >ooi 



7 := || orth(Xi)|| vec ( 00 ) = \\UV T \\ vcc{oo ). 



The parameter p is a balancing parameter to handle disparity between row and column dimensions. The 
quantity ct(p) is the maximum number of non-zero entries in any single row or column. The quantities /3(p) 
and 7 measure the coherence of the singular vectors of Xl, that is, the alignment of the singular vectors 
with the coordinate bases. For instance, under the conditions of Proposition [TJ we have (with p = ^Jn/m) 

3c 2 rank(A > L) c 2 rank(A > L ) 

a{p) < Ci^mn, p (pj < and 7 < 



for some constants c\ and C2. 
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4.1 Operator norms of projection operators 

We show that under the condition mi p>0 a(p)(3(p) < 1, the pair (X S ,X L ) is identifiable from its sum 
Xs + Xl (Theorem [I}. This is achieved by proving that the composition of projection operators Vq and Vf 
is a contraction as per Lemma [TJ which in turn implies that f2 D T = {0}. 

The following two lemmas bound the projection operators Vq and Vf in complementary norms. 

Lemma 7. For any M E K mx " and p € {l,oo}, we have 

\\Vn(M)\\ p ^ p < || sign(X s )|| p ^||M|| vcc(oo) . 

This implies, for all p > 0, 

ll^nllvec(oo)-)-ttO) < a (p)- 

Proof. Define s(Xs) € {0, l} mx ™ to be the entry-wise absolute value of sign(Xg). We have 

\\V n (M)\\ p ^ p = max{\\VdM)v\\ p : || V || P < 1} 

< \\T , n(M)\\ vw (oo)ma,x{\\s(V n (M))v\\ p : \\v\\ p < 1} 

< ||M||vcc(oo) max{||s(Xs)w|| p : ||v|| p < 1} 
= ||M||voc(oo)ll sign(X s )||p^ p . 

The second part follows from the definitions of || ■ \\$( p ) and a(p). □ 

Lemma 8. For any M G R mxn , we have 

\\Vf (M)|| vec(oo) < ||0t/ T || vec(oo) ||M||i^i + ||VV T ||v c(oo)||M|| 00 _ >00 + ||f/||2^oo||V > || 2 ^oo||M|| 2 ^ 2 . 

This implies, for all p > 0, 



H^rllttW-vvecCoo) < P{p)- 

3\\V f (M)\\ ve<oo) = \\UU T . 

\UU T MVV T \\ vec ^ by the triangle inequality. The bounds for each term now follow from the definitions: 



Proof Wehave||P f (M)|| voc(oo) = \\UU T M+MVV T -UU T MVV T \\ vc<oo) < \\UU T M\\ vcc(oo) + \\MVV T \\ vcc{c 



\UU T M\\ vcc(oo) =m&x\\M T UU T e l 



< ||M T || 00 _ >00 max||C7C/ T e i || 00 

i 

= ||M||i^i||f/c/ T ||vcc(oo); 

\\MVV T \\ vcc(oo] = m^\\MVV T e j \\ 00 

3 

< llMHoo^oo max HFt^ejUoo 

3 

= ||-^||oo^oo||^l / ||vcc(oo); 

and 

||C/C7 T M\/y T || voc(co) = m^\eJU{U T MV)V T e \ 

h3 

< ma^\\U T ei\\ 2 \\U T MV\\ 2 ^2\\V T ej\\ 2 (Cauchy-Schwarz) 

< ||M|| a _, 2 ||&||a^ 00 ||V'||a^ao 

< pf||j (p) ||C7||2^oo||V||2^co (Lemma©. 

The second part now follows the definition of /3(p). □ 
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Now we show that the composition of Vq and V-f gives a contraction under the certain norms and their 
duals. 

Lemma 9. For all p > 0, 

2 - ll^f °^d|vec(oo)->vee(oo) < Ct(p)P(p); 

Proof. Immediate from Lemma [7] and Lemma [8j □ 
Lemma 10. For all p > 0, 

1. ||^T ^nllb(rt^b( P ) <o>{p)P(p); 
2 - II^o 'P'tIIvoc(i)^vcc(i) <a(p)/3(p). 

Proof. First note that (Vf o Vq)* = o Vj, = Vq o Vf because Vq and are self-adjoint, and similarly 

(Pq o Vf)* = Vf o Vq. Now the claim follows by Proposition [3] and Lemma|Hl using the facts that || ■ 

is dual to || • ||(|( p ) and that || • || vcc( i) is dual to || • || voc(oo ). □ 

Note that Lcmma[T]is encompassed by Lemma [TOl Anoth er consequence of these contra ction properties is 
the following uncertainty principle, analogous to one stated bv lChandrasekaran et al.l ( 20091 ) . which effectively 



states that a matrix X cannot have both || sign(Jf)||jj( p ) and || orth(X)|| vcc ( 00 ) simultaneously small. 
Theorem 4. If X = X s = X L ^ 0, then inf p>0 a(p)f3(p) > 1. 

Proof. Note that the non-zero element X lives in Cl n T, so we get the conclusion by the contrapositive of 
Theorem [TJ □ 

4.2 Dual certificate 

The incoherence properties allow us to construct an approximate dual certificate (QQ,Qf) G fl x T that is 
central to the analysis of the optimization problems (fTJ) and 

The certificate is constructed as the solution to the linear system 

VaiQa + Qr + ^E) = Asign(X s ) 
Vf(Qn + Q? + V^E) = orth{X L ) 

for some matrix E £ R mxn ; this can be equivalently written as 

Asign(X s ) - n^VniE) 
orth(Xi) - pT l Vf{E) 

We show the existence of the dual certificate (Qq,Qt) under the conditions ©, (0]), and ([5]) relative to 
an arbitrary matrix E. Recall that the recovery guarantees for the constrained formulation requires the 
conditions with E = 0, while the guarantees for the regularized formulation takes E =Y — (Xg + Xif). 

Theorem 5. Pick any c > 1, p > 0, and E e K mx ™. Let k := \ supp(X s )\ and f := rank(X L ). Let 

£2^2 '■— \\E\\2->2 
£vcc(oo) := H^llvoc(oo) + WPf(E)\\vcc(oo)- 

If the following conditions hold: 

a(p)(3(p) < 1 (13) 
(1 - a{p)fi{p))(l - c- p~ 1 e 2 ^2) - c- a(p)p~ 1 e vcc{oo) - c- a(p)-f 

S 1,-1-4) 

^vec(oo) n /i r \ 

- C ' l-a(p)f3(p)-c-a(p)/3(p) > ^ j 







' Qh ' 




V f 1 
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(these are a restatement of and (JSJj, then 

Q n := (1 - V n o Vf)- 1 (Asign(X s ) - P fl (orth(X z )) - p" 1 ^ o 7V)(£7)) e and 
Q T := (I - Vf o Po)" 1 (orth(X L ) - A7> f (sign(X s )) - A^O^t ° ^fix)(B)) € f 

are well-defined and satisfy 

VniQn + Qf + ^E) = Xsign(X s ) 
Vt(Qq + Qt + = orth(Xi) 



and 



Moreover, 



||7V(<9n + <2f + M _1 -E)Hvec(oo) < A/c 
ll^(Qn + 0T + ^ 1 ^)l|2^2 < l/c. 



|| Qf l|2->2 < i 2a } P \ar \ ' ( A + 7 + /" _1 £vcc(oo)) + 1 + 2 / u _1 e 2 ^ 2 

||Qf II* < 2f||Q f 1 1 2^.2 

||<9tIIvcc(oo) < 1 _ a r p }pip) • ( A + 7 + M _1 evBc(oo)) 
2 

HQfillvcc(oo) < 1 - a ^p(p) • ( A + 7 + M _1 £vcc(oo)) 

II On II vcc(l) vec(oo) 
\\Qa + Qt\\ vcc(2) vcc(l) evoc(oo)) + ||<3f II* (l + 1 £2-5-2) ■ 

Remark 1. The dual certificate constitutes an approximate subgradient in the sense that Qq + Qf + p~ 1 E 
is a subgradient of both A|| Jfs|| vec n) at X$ = Xs, and ||-X"l||* at Xl — Xl. 

Proof. Under the condition (|13p. we have a(p)/3(p) < 1, and therefore Lemma [5] and Lemma Q] imply that 
the operators 1 — Vqo Vf and X — Vf o 7^ are invertible and satisfy 

max{||(I-7'oo7 , f) _1 |ls(p)^iJ(p) J IK^ - ^n) _1 ||vcc(oo)^vcc(oo)} < 1 ~ ' 
Thus Qq and Qt are well-defined. We can bound || <3f2 11 2— >2 as 

||Qnll2->-2 < ||<9dl|t(ri (Lemma[2D 

= ||(7- Vn ° Pt)" 1 (Asign(Xs) - P fi (orth(^)) - ^(Pfi ° Pf-O(*0) ||„ (p) 

" l-a(p)(3(p) ' H Asi S n ^) " ni(orth(X L )) - /^(^ o V T ,)(E)\\ i{p) 

< 1 — ■ {X\\sign(X s )\\ m + \\V n (ovth(X L ))\\ Kp) +p- 1 \\(V n oVf^)(E% {p) ) 



< 



1 - a{p)l3{p) 

>-{f± 
■W 

1 - a(p)(3(p) 



< 1 _ 2{p)p( p ) ■ ( A + 7 + /^ 1 |l^(^)llvcc(oo)) (Lemma© 



(A + 7 + p 1 e V cc(oo)) • 
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Above, we have used the bound \\V T ± (E)\\ vcc(oo) = \\E - Vt(E)\\ vcc(oo) < e voc(oo) . Therefore, 

\\V^{Qn + ^E)\\ 2 ^ 2 < UU T )Q n (I - VV T )\\ 2 ^ 2 + pr l \\P f ±(E)\\ 2 ^ 
< \\Qnh^2 + p~ l e 2 ~> 2 

a(p) . _•■ . _i 

" 1 - a(p)^p) ■ (A + 7 + ^ + ^ € ^ 2 - 

The condition (|14j) now implies that this quantity is at most 1/c. 
Now we bound \\Qf || VC c(oo) a s 

IIQtIIvcc(oo) = \\P-Vf oPfi)" 1 Hh(X L ) - AP f (sign(X s ) - M^f ° ^)(£)) || vec(oo) 
" l-a(p)/?(p) ' H 0rth ™ ~ A ^(sign(X s ) - o 7V)(£)|| vec(oo) 

< l _ a ()p() ■ (ll OTth (^L)llvcc(oo) + M\Pf(sign(Xs))\\ V ec(oo) + ^U^T ° ^ ) (^) llvec(oo)) 

- i ] \a< \ ' h + Xa (P)P(p) + M _1 evoc(oo)) (Lemma EJ). 

1 - a{p)P{p) 

Above, we have used the bound \\(Vf °Vn^)(E)\\ vec{oo) = \\V T (E) - (V T oV n ){E)\\ vcc(co) < \\Vf(E)\\ vecioo) + 
^(p)P(p)\\E\\ V0C ( oc ) < £voc(oo)- Therefore, 

\\P^{Qf + ^E)\\ vcc ( oo ) — || Qt 1 1 vcc ( oo ) vec(oo) 

- 1 TT^T ' h + Xa (p)P(P) + M _l£ vcc(oo)) + M~ le voc(oo)- 

l-a(p)/3{p) 

The condition (|15p now implies that this quantity is at most A/c. 
We also have 

HQtIIs-* = \\Vf{Qh + ^E) - orth(A L )|| 2 ^ 2 

2a(p) , _i \ _i 

- 1 TTflTT ' (A + 7 + /i e vcc(oo) ) + 1 + 2p e 2 ^ 2 

l-a(p)0(p) 

since H'PtCQsi) II 2^2 < 2||<2^|| 2 ^ 2 and \\Vf{E)\\ 2 ^ 2 < 2e 2 ^ 2 by Lemma El and 
IIQnllvec(oo) = \\Pq(Qt + M~ l£ - Asign(X s )|| vcc(oo) 

" 1 - a(p)/3(p) ' ( A + T + / i_l£ vcc(oc)) + A + M _1 evcc( TC )- 

The bounds on ||Qj||* and ||Qo|| V cc(i) follow from the facts that rank(Q^) < 2f and || supp(Qfj)|| < k. 
Finally, 

11% + QtIIvcc(2) = {Qn,Va(Qn + Qt)) + (Qt^tiQn + Qt)) 

= (Qn, XVn(sign(X s )) - p^V^E)) + (Q fl T T (orth(X L )) - p^V^E)) 

< A||Qnllvec(l) (l + /i^ 1 A~ 1 ||Pn(£ ; )llvcc(oo)) + IIQTll*( 1 + /^ 1 ll 7, f(^)ll2^2) 

< A||Qfi|| vec(1) (1 + M _1 A" 1 e vec(oo) ) + HQtII* (1 + 2p- 1 e 2 ^ 2 ) . □ 

5 Analysis of constrained formulation 

Throughout this section, we fix a target decomposition (Xs> Xl) that satisfies the constraints of ([T]), and let 
(Xs, Xl) be the optimal solution to (JXJ) . Let As := X$ — X$ and A^ := Xl — Xl- We show that under the 
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conditions of Theorem [5] with E — and appropriately chosen A, solving (Q3 accurately recovers the target 
decomposition (Xs,Xl). 

We decompose the errors into symmetric and antisymmetric parts A avg := (A5 + Al)/2 and A m id := 
(Ag — Ai)/2. The constraints allow us to easily bound A avg , so most of the analysis involves bounding 
A m id in terms of A avg . 

Lemma 11. || A avg || vcc(1 ) < e voc(1) and ||A avg ||* < e». 

Proof. Since both (XsjXl) and (XsjXl) as feasible solutions to ([lj, we have for <0> g {vec(l),*}, 

||A avg ||o - 1 /2||A sr + Al||o 

= i/4(X s + X L - Y) - (X s + X L -Y)\\ <> 

< 1/2 (\\x s + x L - y||o + \\x s + x L - y\\ 

<e<>- □ 
Lemma 12. Assume the conditions of Theorem^ hold with E = 0. We have 

A||7V(A mid )||vec(i) + ||^(Amid)||* < (1 - l/c)- 1 (A||A avg || vec(1) + 

II Aavg ||*J ■ 

Proof. Let Q := Qq + Qf be the dual certificate guaranteed by Theorem [3J Note that Q satisfies the 
conditions of Lemma |6l so we have 

X\\X S + A mid || 

vec(l) "r ||^L-A mid ||,-A||Xs|| 

vec(l) IM* 

> (1- l/c) (A||n^(A mi d)||vcc(l) + ||7V(Amid)||*) • 

Using the triangle inequality, we have 

^ll-Xsllvec(l) + ll-^ill* = M\^S + A S || VCC ( 1 ) + \\X L + A L ||* 

— X\\Xs + A mi d + A avg || vec ( 1 ) + \\Xl — A mi d + A avg ||* 

> A||Xs + A m id||vcc(l) — A|| A aV g||vec(l) + \\Xl ~ A m id||* — || A avg ||*. 

Now using the fact that A||Xs||voc(i) + II^l||* < A||Xs|| vec (x) + H-XiH* gives the claim. □ 
Lemma 13. Let k :— | supp(Xg)|. Assume the conditions of Theorem\5\ hold with E — 0. We have 

(1 - l/c)- 1 

||Pn( A mid)||vec(l) < 1 _ a {p)(3{p) ' ( 1 1 ^a-v-g 1 1 vcc ( X ) + || A aV g||*/A). 

Proof. Because A mid = Vn(A mid ) + Vqx (A mid ) = V T (A mid ) + V T ± (A mid ), we have the equation 

Pn(A mld ) - V T (A mld ) = -7V(A mid ) + 7V(A mid ). 
Separately applying Vq and Vf to both sides gives 



^n(Amid) 

-P f (Amid) 



(P n °^)(Amid) 
■Vf 0^)(Amid) 



Under the condition a(p)f3{p) < 1, Lemma [TU1 and Lemma U imply that 

\\(l-V n oVr)-% ec(1) ^ ecil) < 1 _ a{p)m 

and that 

Vn(A mid ) = (1 - Vn ° T^) -1 (fl>n 7V)(A m id) - (7>n 7V)(A roid )) 
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Therefore 



l^n(Amid)|| vcc(l) 

< 1 - - - • (ll(^n°%^)(Amid)||vec(l) + \\(Pci 'Pf °7V)(Amid)||vcc(l)) 

||Pfi(Amid)||vec(2) + a(j>)0(p) • ||Pqx (Amid) ||vec(i)) (LemmaEO]) 

• ||P f x(A mid )||„ + a{p)P{p) ■ ||7V(Amid)||vec(l)) 

■ (A||A aV g||vcc(i) + ||A avg ||*) (Lemma[l2|) 



< 



< 



< 



1 - 


a{p)/3(p) 




1 


1 - 


a(p)(3(p) 




1 


1 - 




(1 




1 - 


a{p)p(p) 


(1 


1/c)- 1 


1 - 


«(p)i8(p) 



- 1 „Y.\o/.\ ' (l|Aavg||voc(l) + l|A a vg||*/A) 

where the last inequality uses the facts k < a(p) 2 , a(p)(3(p) < 1, and Xa(p) < 1. □ 

We now prove Theorem [5J which we restate here for convenience. 
Theorem 6 (Theorem [5] restated) . Assume the conditions of Theorem\S\ hold with E — 0. We have 

max{||A 5 || V0C (i), ||A L || vec(1) } 



< 



l 1 + (1 " 1/c) •i-«(pMp)J' evcc(1) + (1 - 1/c) ' i - a( P m P ) • e * /A - 



//, m addition for some b > ||Xl||voc(oo); either: 

• £/ie optimization problem §Q is augmented with the constraint ||^Cl|| vec(oo) < fr> or 

• Xi is post-processed by replacing [Xi]^ with min{max{ [JTji.j, — b}, b} for all 
then we also have 

||Al||vcc(2) < min |||A l || vcc(1 ), ^2b\\ A L \\ vec(1) } . 

Proof. First note that since A s = A avg +A mid and A L = A avg -A mid , we have max{|| A s || vec(1 ), || A L || vec(1 )} < 
||A a vg||voc(i) + ||A mi d||vcc(i)- We can bound || A mid || vcc (i) as 

II A m id||voc(l) < ||7 : n- L (A m id)||vcc(l) + H'PnCAmid)!! vec(l) 

< (1 - 1/c)" 1 • (l + 1 _ a 1 {p)m ) ■ (l|A avg ||vec(l) + l|Aavg||*/A) 

by Lemma 1121 and Lemma 1131 The bounds on ||As|| vec (i) and ||Ai,|| vec (i) follow from the bounds on 
||A mi d||vcc(i), ||A avg || vcc (i), and ||A avg ||* (from Lemma [TT])- 

If the constraint ||-Xz,|| V ee(oo) < b is added, then we can use the facts HA^Hve^^) < ||-X"i|| V ec(oo) + 
II^lIIvoc(oo) < 2& and ||Al|| VC c(2)v / I|Al||vcc(oo)||Al||vcc(i) < a/26||Al||vcc(i)- K X l is post-processed, then 
(letting c\ip(X L ) be the result of the post-processing) |clip(X i ) i j — [Xi]i.j| < |[-Xi]ij — f° r an *>i> 



so || clip(X L ) - Xz,|| vec(1) < ||A L || vcc(1) and || clip(X L ) - X L \\ vcc{2) < J2b\\ clip(X L ) - X L \\ vcc{1) . □ 
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6 Analysis of regularized formulation 

Throughout this section, we fix a target decomposition (Xs,Xl) that satisfies \\Xg — y|| VO c(oo) < b, and 
let (Xs,Xl) be the optimal solution to © augmented with the constraint \\Xs — y|| VO c(oo) < b for some 
b > \\Xs — ^Hvcc(oo) (b = oo is allowed). Let As := X$ — X$ and A^ := Xl — Xl- We show that under the 
conditions of Theorem [5] with E = Y — (Xs + Xl) and appropriately chosen A and fx, solving ^ accurately 
recovers the target decomposition (Xs,Xl). 

Lemma 14. There exists G S ,G L ,H S € K mxn such that 

1. »- l (X s + X L -Y) + \G S + H = 0; ||G s || vec(oo) < 1; 

2. n^iXs + X L -Y) + XG L = 0; \\G L \\ 2 ^ 2 < I; 

3. [HsjiA^skj >0Vi,j . 

Proof. We express the constraint \\X$ — ^|| V oc(oo) < 6 in @ as 2mn constraints [Xs]ij — Yij — b < and 
— [Xs]i,j + Yi.j — b < for all Now the corresponding Lagrangian is 

-L\\X S + X L - Y\\ 2 VCC{2) + A||X s || vec(1) + \\X L \\. + (A+,X S - Y - bl m ,n) + (A~,-X s +Y - bl m ,n) 

where A + , A~ > and l m ,„ is the all-ones m x n matrix. First-order optimality conditions imply that there 
exists a subgradient Gs of || A'5|| voc ( 1 ) at Xs = Xs and a subgradient Gl of at Xl = Xl such that 

/i- 1 (A > s+A > L -y) + AGs + (A + -A-)=0 and ^(Xs + X L - Y) + G L = 0. 

Now since \\Xs — ^|| V ec(oo) < b, we have [Xs]i,j < Y t _j + b and —{X s }i.j < —Y L j + b. By complementary 
slackness, if Afj > 0, then [A^sjij — Yij — b = 0, which means [Xs]i,j — [Xs]i,j > [Xs]i,j ~ (Yi,j + b) = 0. So 
A+ IAs]^ > 0. Similarly, if Ar. > 0, then [X s ] i:j - [X s ]i,j < 0. So Ar.[A s ] itj < 0. Therefore H := A+-A" 
satisfies Hij[As]ij > 0. □ 

Lemma 15. Assume the conditions of Theorem^ hold with E = Y — (Xs + Xl), and let (Qq, Qt) be the 
dual certificate from the conclusion. We have 

A||7V(A s )|| voc(1) + \\Vt^l)\\* < (1 - l/crlQfi + Qf llvcc(2)M/2. 

Proof. Let Q :— Qq + Qf and A :— As + Al- Since Q + [i~ l E satisfies the conditions of Lemma|6l 

(1 - 1/c) (A||Pn-(A S )||vec(i) + ||^(A L )||») 

< (A||l s || vec(1) + \\±l\\*) - (A||X s || vec(1) + \\X L \U) - (Q + ti- l E,A s + A L ). 

Furthermore, by the optimality of (Xs,Xl), 

1 2 1 

^H £ llvec(2)-^, 



(A||Xs|| vec( i) + \\X L \U) - (A||X s || vec(1) + \\X L \U) <—)\Xs+Xl- Yf vec(2) -—\\Xs + X L - Y\\ 2 vec(2) 



:\\Ef vcc{ 2) ~ ir-\\^s + A L - Ef vcc{2) 



= ±-(2(E,A)-(A,A)). 

Combining the inequalities gives 

(1 - 1/c) (A||7V(As)ILc(i) + \\r^(A L )\U) < -(Q,A) ~ i-(A, A) < \\Q\\ 2 vcc(2) ^/2 
where the last inequality follows by taking the maximum value over A at A = —fiQ. □ 
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Now we prove Theorem [31 restated below (with an additional result for |j A ^ II b(p) ) ■ 

Theorem 7 (Theorem [3] restated) . Let k := |supp(Xg)| and f := rank(Xi). Assume the conditions of 
Theorem\5\ hold with E = Y — (Xs + Xl), and let (Qq, Qf) be the dual certificate from the conclusion. We 
have 



A-!(l - l/cj^WQn + QtIIvcc (2 )^ + AfcM + 2Vfef/x + k\\(V n o V T ^(E)\\ vcc(oc) 

1 - a(p)P(p) 

||A s || vec ( 2 ) < min {||A S || 

vcc(l) ) ^2&||A S || VCC(1) } 

||A L || b(p) < (1 - l/c^WQa + 0tIIvcc(2)M/2 + min {/3(p)\\ A s || vec(1) , V2f\\ A s || vcc(2 )} + \\Vf (E)\\, + 2fp 
\\A L \U < (1 - l/c)- l \\Q h + Q T \\l cc(2) p/2 + V¥\\ A s \\ vcc(2) + \\V r (E)\\, + 2fp. 

Proof. From Lemma PHI we obtain Gs, Gl, Hs & M. mxn and the following equations: 

p-HVd&s) + V n (A L ) - V n (E)) + V n (Hs) = -A7> n (G s ) (16) 
p-^VfiAs) +V f (A L ) - V T (E)) = -V f {G L ) (17) 
^\{Vn ° Vf )(As) + (Pa ° V T )(A L ) - (V n o V T )(E)) = -(V n o V T )(G L ). (18) 

Subtracting (fT8"j) from (fTB) gives 

M-^^CAs) - (7> fi o V f o P fi )(A s ) - (Pn ° 7>t ° ?V)(As) - (7> fi ° x)(A L )) + P fi (ff s ) 

= -Wn(Gs) + (Va ° + /T 1 ^ ° V T ^){E). 

Moreover, we have (sign(A s ), T n (A s )) = ||^(A s )|| voc(1) and (sign(A s ), T n (H s )) = \\V n (H s )\\ vcc(1) , so 
taking inner products with sign(Ag) on both sides of the equation gives 

M^II^A-OIUti) + ||^(i?s)llvec(l) 

< fT^KVa o V T o ^)(A s )|| voc( i) + p- l \\{Vn o P f o 7V)(As)|U(i) + /i" 1 ^ ° ^)(Al)|| V cc(i) 

+ A||7>q(Gs)|| VC c(1) + ||(Pfl °^f)(GL)||vcc(l)+^ 1 ||(ni°^)(^)llvcc(l) 
.-l-./^Q/'-MI-n ^ A Ml i ..-!„ /„\o/„\llTl ( A Ml i ..-1, 



< ^ i a(/o)/3(p)||n2(As)||vec(i) +/i" i a( j o)/3(p)||^(As)||vcc(i) + M~ V A;||P^ (A r )|| vec(2) 

+ Xk + V%\\V T (G L )\\ VCC{2) + M _1 fc||(^n ° 7V)(£)IUc(oc) 

< /i- 1 a(p)/3(p)||n2(A s )||vcc(i) +M" 1 a(p)/3(p)lln^(As)||vcc(i) + M _1 ^||^(A L )|| voc(2) 

+ Afc + 2V^||G L || 2 ^ 2 + /x-^IK^n ° 7V)(£)llvcc(oo) 

< M" 1 «(p)^(p)||7 ? n(A S )||vec(i) +^ 1 a(p)/?(p)ll^(A5)llvcc(i) + M" 1 V / ^||^(Al)||vcc( 2 ) 

+ Xk + 2\flf + p-^kUVn o V T x)(E)\\ vec(oo) . 

The second and third inequalities above follow from Lemma [5] and Lemma [TUl and the fourth inequality uses 
the fact that ||Gx||2->2 < 1- Rearranging the inequality and applying Lemma [T5l gives 

(l- a (p)/?(p))||^(As)||vcc(l) 

<a{p)P(p)\\Vn±(A s )\\ 

vcc(l) Vf±(A L )\\ vcc ( 2 ) + Xkp + 2\Tkrp + k\\(Pn ° Vf±)(E)\\ 

< max{a(p)/3(p)/A, - l/c)- l \\Q a + Qt\\I cc (2)V/ 2 + X ^ + iV^p + k\\{V n o P^)(^)llvec(oo) 

< A- X (l - l/c)- x ||Qn + QtIIvcc (2) m/2 + Xkp + 2^fp + k\\(Vn ° ^)(^)llvec(oc) 
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since k < a(p) 2 , a(p)(3(p) < 1, and Xa(p) < 1. Now we combine this with || As|| vcc (i) < \\Vq± (As)\\ vec (i) + 
1 1 7^£=2 ( A-s) II vcc(i) an d Lemma [Pol to get the first bound. 

For the second bound, we use the facts || Ag|| VO c(oo) < ||A"s — ^Hvec(oo) + \\Xs — ^llvec(oo) < 26 and 

l|As|| V cc(2) < VII A s|lvcc(l)||A S || vcc ( 1 } ■ 

For the third and fourth bounds, we obtain from (fT7)) 

\\V T (A L )\\ Kp} < \\V T (A s )|| Kp) + \\V T {E)\\ Kp) + p\\Vf (G L )\\ Kp) 

< \\VtW vec(l)— >b(p) \\A s \\ vcc{1) + \\Vf(E)\\* + p\\Pt(Gl)\\* (Lemma© 

= ll^ll»( P )^vec(o )||Ag|| vec(1) + ||7 :, r(£;)|U+/i||PT(G ! £)|U (Proposition |31) 

< /3(p)||A s || vcc(1) + \\Pf(E)\\* + »\\Pt {Gl)\\* (Lemma© 

< /3( /9 )||A s || vcc (i) + \\Vf(E)\U + 2fp (Lemma[5]and ||G L ||a-«j < 1) 

and 

||% (Ai)||* < 11% (A S )IU + WPt(E)\\* + ^WPt(Gl)\\* 

< \/2f||As|| vec(2) + \\V f (E)\\ t + 2fp (Lemmaland ||Gz|| 2 -+2 < !)• 

Now we combine these with 

||A L || Kp) < \\V f ±(A L )\\ Kp) + ||%(A L )|| b(rt 

< ||P fi (A i )||*+niin{||%(A i )|| <I) ||%(A i )|| Kp3 } (Lemma© 
||A L ||, < \\Vt^l)\U + \\Vt(A l )\U 

and Lemma [T5l □ 

Note that we have an error bound for A^ in || • \\b( P ) norm, which can be significantly smaller than the 
bound for the trace norm of A^. 
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